This is one page of the R Handbook for Epidemiologists, but is being printed as a stand-alone page.
You can find the complete handbook on Github
Spatial aspects of your data can provide a lot of insights into the situation of the outbreak to answer questions such as:
In this section, we will explore basic spatial data visualization methods using tmap and ggplot2 packages. We will also walk through some of the basic spatial data management and querying methods with the sf package.
Choropleth map
Density heatmap
Health facility catchment area
Load packages
First, load the packages required for this analysis:
pacman::p_load(rio, # to import data
here, # to locate files
tidyverse, # to clean, handle, and plot the data (includes ggplot2 package)
sf, # to manage spatial data using a Simple Feature format
tmap,# to produce simple maps, works for both interactive and static maps
janitor, # to clean column names
OpenStreetMap # to add OSM basemap in ggplot map
) Sample case data
# import aggregated case counts of disease X
linelist <- rio::import(here::here("data", "linelist_cleaned.rds"))
linelist <- linelist[sample(nrow(linelist), 1000),]
# Create sf object
linelist_sf <-
linelist %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)Sierra Leone: Admin boundary shapefiles
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-all-ad-min-level-boundaries
# ADM3 level
sle_adm3 <-
sf::read_sf(here::here("data/shp", "sle_adm3.shp")) %>% janitor::clean_names() %>%
filter(admin2name %in% c("Western Area Urban", "Western Area Rural"))Sierra Leone: Population by ADM3
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-population
# Population by ADM3
sle_adm3_pop <-
read.csv(here::here("data/population", "sle_admpop_adm3_2020.csv")) %>% janitor::clean_names()Sierra Leone: Health facility data from OpenStreetMap
Data downloaded from HDX: https://data.humdata.org/dataset/hotosm_sierra_leone_health_facilities
The easiest way to plot the XY coordinates (points) is to draw a map directly from the sf object which we created in the preparation section.
tmap offers simple mapping capabilities for both static (plot mode) and interactive (view mode) with just a few lines of codes.
This blog provides a good comparison among different mapping options in R. https://rstudio-pubs-static.s3.amazonaws.com/324400_69a673183ba449e9af4011b1eeb456b9.html
tmap_mode("plot") # or "plot"
#tm_shape(sle_adm3, bbox = st_bbox(linelist_sf)) +
tm_shape(sle_adm3, bbox = c(-13.3,8.43, -13.2,8.5)) +
tm_polygons(col = "#F7F7F7") +
tm_borders(col = "#000000", lwd = 2) +
tm_text("admin3name") +
tm_shape(linelist_sf) + tm_dots(size=0.08, col='blue') Choropleth maps can be useful to visualize your data by pre-defined area usually by administrative unit or health area for outbreak response to be able to target resources for specific area high incidence rates for example.
The current linelist data does not contain any information about the administrative units. Although it is ideal to store such information during the initial data collection phase, we can also assign administrative units to individual cases based on their spatial relationships (i.e. point intersects with a polygon).
sf package offers various methods for spatial joins. See more documentation about the st_join method and spatial join types here: https://r-spatial.github.io/sf/reference/geos_binary_pred.html
Spatial assign administrative units to cases First spatially intersect our case locations (points) with the ADM3 boundaries (polygons)
linelist_adm <-
linelist_sf %>%
sf::st_join(sle_adm3, join = st_intersects) %>%
select(names(linelist_sf), admin3name, admin3pcod)
# Now you will see the ADM3 names attached to each case
linelist_adm %>% select(case_id, admin3name)
## Simple feature collection with 1000 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -13.27095 ymin: 8.448387 xmax: -13.20545 ymax: 8.490746
## geographic CRS: WGS 84
## First 10 features:
## case_id admin3name geometry
## 3201 df9d8c West II POINT (-13.24674 8.470331)
## 345 2ae536 Mountain Rural POINT (-13.21605 8.452737)
## 242 730684 West II POINT (-13.23525 8.460555)
## 2244 9c1ae9 West III POINT (-13.26582 8.458813)
## 2087 ba638b East II POINT (-13.21384 8.478989)
## 575 c9b9ce Mountain Rural POINT (-13.21256 8.467732)
## 1071 676b4f Mountain Rural POINT (-13.22289 8.482292)
## 1020 01b12d West III POINT (-13.26422 8.452109)
## 571 8ba52d West II POINT (-13.24695 8.464259)
## 2158 dce5cc East II POINT (-13.22085 8.483679)Case counts by ADM3
case_adm3 <-
linelist_adm %>% as_tibble() %>%
#filter(!is.na(admin3pcod)) %>%
group_by(admin3pcod, admin3name) %>%
summarise(cases = n()) %>%
arrange(desc(cases))
case_adm3
## # A tibble: 9 x 3
## # Groups: admin3pcod [9]
## admin3pcod admin3name cases
## <chr> <chr> <int>
## 1 SL040102 Mountain Rural 308
## 2 SL040208 West III 213
## 3 SL040207 West II 172
## 4 SL040204 East II 110
## 5 SL040203 East I 64
## 6 SL040201 Central I 47
## 7 SL040206 West I 40
## 8 SL040205 East III 24
## 9 SL040202 Central II 22Choropleth mapping Now that we have the administrative unit names assigned to all cases, we can start mapping the case counts by area (choropleth maps).
Since we also have population data by ADM3, we can add this information to the case_adm3 table created previously.
# Add population data and calculate cases per 10K population
case_adm3 <-
case_adm3 %>%
left_join(sle_adm3_pop, by=c("admin3pcod"="adm3_pcode")) %>%
select(names(case_adm3), total) %>%
mutate(case_10kpop = round(cases/total * 10000, 3))
case_adm3
## # A tibble: 9 x 5
## # Groups: admin3pcod [9]
## admin3pcod admin3name cases total case_10kpop
## <chr> <chr> <int> <int> <dbl>
## 1 SL040102 Mountain Rural 308 33993 90.6
## 2 SL040208 West III 213 210252 10.1
## 3 SL040207 West II 172 145109 11.9
## 4 SL040204 East II 110 99821 11.0
## 5 SL040203 East I 64 68284 9.37
## 6 SL040201 Central I 47 69683 6.74
## 7 SL040206 West I 40 60186 6.65
## 8 SL040205 East III 24 500134 0.48
## 9 SL040202 Central II 22 23874 9.22Join this table with the ADM3 polygons for mapping
# Add population data and calculate cases per 10K population
case_adm3_sf <-
case_adm3 %>%
left_join(sle_adm3, by="admin3pcod") %>%
select(objectid, admin3pcod, admin3name=admin3name.x, admin2name, admin1name,
cases, total, case_10kpop, geometry) %>%
st_as_sf()Mapping the results
# Number of cases
tmap_mode("plot")
tm_shape(case_adm3_sf) +
tm_polygons("cases") +
tm_text("admin3name")# Cases per 10K population
tmap_mode("plot")
tm_shape(case_adm3_sf) +
tm_polygons("case_10kpop",
breaks=c(0, 10, 50, 100),
palette = "Purples"
) +
tm_text("admin3name")We can also look at the combination of time and space by facetting the heatmaps.
Set parameters for the basemap using the OpenStreetMap package.
# Fit basemap by range of lat/long coordinates. Choose tile type
map <- openmap(c(max(linelist$lat, na.rm=T), max(linelist$lon, na.rm=T)), # limits of tile
c(min(linelist$lat, na.rm=T), min(linelist$lon, na.rm=T)),
zoom = NULL,
type = c("osm", "stamen-toner", "stamen-terrain","stamen-watercolor", "esri","esri-topo")[1],
mergeTiles = TRUE)
# Projection WGS84
map.latlon <- openproj(map, projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")Heatmap by month of onset
# Extract month of onset
linelist$date_onset_ym <- format(linelist$date_onset, "%Y-%m")
# Simply facet above map by month of onset
# Plot map. Must be autoplotted to work with ggplot
OpenStreetMap::autoplot.OpenStreetMap(map.latlon)+
# Density tiles
ggplot2::stat_density_2d(aes(x = lon,
y = lat,
fill = ..level..,
alpha=..level..),
bins = 10,
geom = "polygon",
contour_var = "count",
data = linelist %>% filter(date_onset>='2014-08-01' & date_onset<='2015-01-31'),
show.legend = F) +
#scale_fill_gradient(low = "black", high = "red")+
labs(x = "Longitude",
y = "Latitude",
title = "Distribution of simulated cases by month of onset") +
facet_wrap(~ date_onset_ym, ncol = 3)It might be useful to know where the health facilities are located in relation to the disease hot spots.
Finding the nearest health facility We can use the st_nearest_feature method from the sf package to assign the cloest health facility to individual cases.
# Closet health facility to each case
linelist_sf_hf <-
linelist_sf %>%
st_join(sle_hf, join = st_nearest_feature) %>%
select(case_id, osm_id, name, amenity)We can see that “Den Clinic” is the closest health facility for about ~30% of the cases.
# Group cases by health facility
hf_catchment <-
linelist_sf_hf %>% as.data.frame() %>%
group_by(name) %>%
summarise(case_n = n()) %>%
arrange(desc(case_n))
hf_catchment
## # A tibble: 8 x 2
## name case_n
## <chr> <int>
## 1 Den Clinic 339
## 2 Shriners Hospitals for Children 338
## 3 GINER HALL COMMUNITY HOSPITAL 179
## 4 panasonic 53
## 5 Princess Christian Maternity Hospital 37
## 6 <NA> 22
## 7 ARAB EGYPT CLINIC 21
## 8 MABELL HEALTH CENTER 11Visualizing the results on the map
tmap_mode("view")
tm_shape(linelist_sf_hf) + tm_dots(size=0.08, col='name') +
tm_shape(sle_hf) + tm_dots(size=0.3, col='red') + tm_text("name") +
tm_view(set.view = c(-13.2284,8.4699, 13), set.zoom.limits = c(13,14))Cases within 30 mins Walking distance from the closest health facility
We can also explore how many cases are located within 2.5km (~30 mins) walking distance from the closest health facility.
Note: For more accurate distance calculations, it is better to re-project your sf object to the respective local map projection system such as UTM (Earth projected onto a planar surface). In this example, for simplicity we will stick to the World Geodetic System (WGS84) Geograhpic coordinate system (Earth represented in a spherical / round surface, therefore the units are in decimal degrees). We will use a general conversion of: 1 decimal degree = ~111km.
See more information about map projections and coordinate systems: https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/
First create a circular buffer with a radius of ~2.5km aroudn each health facility
Intersect this with the cases
# Intersect the cases with the buffers
linelist_sf_hf_2k <-
linelist_sf_hf %>%
st_join(sle_hf_2k, join = st_intersects, left = TRUE) %>%
filter(osm_id.x==osm_id.y | is.na(osm_id.y)) %>%
select(case_id, osm_id.x, name.x, amenity.x, osm_id.y)Count the results
202 out of 1000 cases (20.2%, shown in red dots in the map below) live more than 30 mins away from the nearest health facility)
nrow(linelist_sf_hf_2k)
## [1] 1000
nrow(linelist_sf_hf_2k[is.na(linelist_sf_hf_2k$osm_id.y),])
## [1] 197Visualize the results
tmap_mode("view")
tm_shape(linelist_sf_hf) + tm_dots(size=0.08, col='name') +
tm_shape(sle_hf_2k) + tm_borders(col = "red", lwd = 2) +
tm_shape(linelist_sf_hf_2k[is.na(linelist_sf_hf_2k$osm_id.y),]) +tm_dots(size=0.1, col='red') +
tm_view(set.view = c(-13.2284,8.4699, 13), set.zoom.limits = c(13,14))R Simple Features and sf package https://cran.r-project.org/web/packages/sf/vignettes/sf1.html
R tmap package https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html
ggmap: Spatial Visualization with ggplot2 https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf